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Abstract: Human coronavirus (HCoV), a member of Coronaviridae family, is the causative 
agent of upper respiratory tract infections and "atypical pneumonia". Despite severe epidemic 
outbreaks on several occasions and lack of antiviral drug, not much progress has been made with 
regard to an epitope-based vaccine designed for HCoV In this study, a computational approach 
was adopted to identify a multiepitope vaccine candidate against this virus that could be suitable 
to trigger a significant immune response. Sequences of the spike proteins were collected from a 
protein database and analyzed with an in silico tool, to identify the most immunogenic protein. 
Both T cell immunity and B cell immunity were checked for the peptides to ensure that they had 
the capacity to induce both humoral and cell-mediated immunity. The peptide sequence from 
88-94 amino acids and the sequence KSSTGFVYF were found as the most potential B cell and 
T cell epitopes, respectively. Furthermore, conservancy analysis was also done using in silico 
tools and showed a conservancy of 64.29% for all epitopes. The peptide sequence could interact 
with as many as 1 6 human leukocyte antigens (HLAs) and showed high cumulative population 
coverage, ranging from 75.68% to 90.73%. The epitope was further tested for binding against 
the HLA molecules, using in silico docking techniques, to verify the binding cleft epitope 
interaction. The allergenicity of the epitopes was also evaluated. This computational study of 
design of an epitope-based peptide vaccine against HCoVs allows us to determine novel pep- 
tide antigen targets in spike proteins on intuitive grounds, albeit the preliminary results thereof 
require validation by in vitro and in vivo experiments. 
Keywords: vaccinomics, HLA, atypical pneumonia, allergenicity, docking 

Introduction 

Human coronavirus (HCoV) belongs to the Coronaviridae family (alphacoronavirus 1) 
and comprises a large group of enveloped, positive-sense, single-stranded polyadenylated 
RNA virus. '-^ It consists of the largest known viral RNA genomes, ranging from 27.6 to 3 1 .6 
kb. Usually, coronaviruses are classified into three groups (group I to III), based on their 
serological cross-reactivity.^ Their classification is also supported by evolutionary analy- 
sis. ' The group I viruses are animal pathogens, including porcine epidemic diarrhea virus 
and feline infectious peritonitis virus. The group II viruses are responsible for domestic 
animal pathogenic infections, and the final group III viruses are responsible for avian 
species infection." However both the group I and group II viruses are considered HCoV 
The protein molecules that usually contribute the structure of all coronaviruses are the 
spike (S), envelope (E), membrane (M) and nucleocapsid (N). HCoV is usually the caus- 
ative agent of upper respiratory tract infections and also the causative agent of "atypical 
pneumonia", which was first identified in the People's Republic of China.' As nowadays. 
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an environmental resistance is shown by these viruses,*" it is 
urgent to develop an effective prevention for HCoV Currently, 
there is no available treatment or vaccine to cure HCoV infec- 
tions. Due to the ever rising spread of this viral infection, the 
development of vaccines or antiviral drugs against HCoVs 
infections is crucial. 

A novel approach integrating immunogenetics and 
immunogenomics with bioinformatics for the development 
of vaccines is known as vaccinomics.^ This approach has 
been used to address the development of new vaccines. The 
present conventional approach for vaccine development relies 
on antigen expression, in sufficient amount, from in vitro 
culture models; however, many antigens, while expressed suf- 
ficiently, may not be good candidates for vaccine. With these 
conventional approaches, it has not been possible to control 
different types of outbreaks of viral pathogens, such as recent 
avian and swine influenza strains, due to their time-consuming 
development process. Hence, the rapid in silico informatics- 
based approach has gained much popularity with the recent 
advancement in the sequencing of many pathogen genomes 
and protein sequence databases.^ The "vaccinomics" approach 
has already proven to be essential for combating diseases such 
as multiple sclerosis,' malaria,'" and tumors. ' ' However, these 
methods of vaccine development usually work through the 
identification of human leukocyte antigens (HLA) ligands 
andT cell epitopes,'^ which specify the selection of the potent 
vaccine candidates associated with the transporter of antigen 
presentation (TAP) molecules.""'* Allergenicity assessment is 
one of the vital steps in the development of a peptide vaccine 
because when we provide the vaccine into the human body, it 
is detected as a foreign substance. As a result, inflammation 
occurs, demonstrating an allergic reaction. For the prediction 
of a B-cell epitope, hydrophilicity is an important criterion 
which is usually in the beta turns region. These assessments 
strengthen the possibility of the vaccine candidates. Therefore, 
our present study was undertaken to design an epitope-based 
peptide vaccine against HCoVs (229E, NL63, HKUl, EMC, 
and OC43) using the vaccinomics approach, with the wet lab 
researcher expected to validate our prediction. 

Materials and methods 

The flow chart summarizing the protocols for the complete 
epitope prediction is depicted in Figure 1 . 

Viral strain selection 

ViralZone, a database of the ExPASy Bioinformatics 
Resource Portal was used for the selection of HCoVs and 



their associated information, including their genus, family, 
host, transmission, disease, genome, and proteome. 

Protein sequence retrieval 

The outer membrane protein (spike protein) sequences of 
HCoV were retrieved from the UniProtKB database." Then 
all the sequences were stored as a FASTA format for further 
analysis. 

Evolution analysis 

For the analysis of the evolutionary divergence in the 
membrane proteins of HCoV, a phylogenetic tree was 
constructed, using the ClustalW2 multiple sequence align- 
ment tool.'^ 

Antigenic protein identification 

VaxiJen v2.0," a server for the prediction of protective 
antigens and subunit vaccines, was used for the determina- 
tion of the most potent antigenic protein. Here, we used the 
default parameter of this server for the determination of the 
antigenic protein. 

T Cell epitope identification 

The NetCTL 1.2 server was used for the identification 
of the T cell epitope.^" The prediction method integrated 
peptide major histocompatibility complex class I (MHC-I) 
binding; proteasomal C terminal cleavage, and TAP transport 
efficiency. The epitope prediction was restricted to 12 MHC-I 
supertypes. MHC-I binding and proteasomal cleavage were 
performed through artificial neural networks, and the weight 
matrix was used for TAP transport efficiency. The parameter 
we used for this analysis was set at threshold 0.5 to maintain 
sensitivity and specificity of 0.89 and 0.94, respectively. 
This allowed us to recruit more epitopes for further analysis. 
A combined algorithm of MHC-I binding, TAP transport 
efficiency, and proteasomal cleavage efficiency were selected 
to predict overall scores. 

A tool from the Immune Epitope Database^' was used 
to predict the MHC-I binding. The stabilized matrix base 
method (SMM)^^ was used to calculate the half-maximal 
inhibitory concentration (IC^^) values of peptide binding 
to MHC-I molecules from different prediction methods. 
For the binding analysis, all the alleles were selected, and 
the length was set at 9.0 before prediction was done. For 
the selected epitopes, a web-based tool was used to predict 
proteasomal cleavage, TAP transport, and MHC-I.^' This 
tool combines predictors of proteasomal processing, TAP 
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Divergence analysis, using 
ClustalW2 



Prediction of best possible B cell 
epitopes from the lEDB tools, 
including: 



□ Kolaskar and Tongaonkar 
antigenicity 

□ Emini surface accessibility 

□ Karplus and Schuiz flexibility 

□ Bepipred linear epitope 

□ Chou and Fasman beta-turn 
prediction 



Select ttie most potent 
B cell epitope 



Strain selection from ViralZone 



Retrieving sequences of spike proteins of HCoV 



Antigenic protein determination, using VaxiJen 



J/ 



Most antigenic protein 
selected for further analysis 



T cell epitope prediction, using NetCTL, by 
proteasomal C-terminal cleavage, TAP transport 
efficiency, and MHC class I binding analysis 
(at threshold 0.5) 



V 



Best 5 epitopes were selected 
for further analysis 



Epitopes with ICj^<200 nM value for their binding 
to class I molecules were chosen, and their MHC 
class I interaction analysis, using the lEDB combined 
(cleavage/TAP transport/MHC class 1) predictor 



> 

'k 

Epitope conservancy analysis, using the lEDB 
conservancy analysis tool 

Population coverage analysis, using the lEDB 
analysis resource 

Allergenicity assessment by AllerHunter 



Prediction of the 3D structure of 
the epitope, using PEP-FOLD 



V 

Docking analysis with the selected HLA (HLA-B*15:01) 
and the proposed epitope 



Figure 1 Flow chart summarizing the protocols for the complete epitope prediction. 

Abbreviations: 3D, three dimensional; IC^^, half-maximal inhibitory concentration: HCoV, human coronavirus; HLA, human leukocyte antigen: ; HLA-B, the-major 
histocompatibility complex, class I, B; lEDB, Immune Epitope Database; MHC, major histocompatibility complex: TAP, transporter of antigen presentation. 



transport, and MHC-I binding to produce an overall score 
for each peptide's intrinsic potential as a T cell epitope. 
SMM was also implemented for this prediction. 

Epitope conservancy analysis 

For the analysis of the epitope conservancy, the web-based 
tool from lEDB^'' analysis resource was used. 

Prediction of population coverage 

Population coverage for each individual epitope was selected 
by the lEDB population coverage calculation tool analysis 
resource. Here we used the allelic frequency of the interacting 



HLA alleles for the prediction of the population coverage for 
the corresponding epitope. 

Allergenicity assessment 

The web-based AllerHunter server^^ was used to predict the 
allergenicity of our proposed epitope for vaccine develop- 
ment. This server predicts allergenicity through a combina- 
tional prediction, by using both integration of the Food and 
Agriculture Organization (FAO)/World Health Organization 
(WHO) allergenicity evaluation scheme and support vector 
machines (SVM)-pairwise sequence similarity. AllerHunter 
predicts allergens as well as nonallergens with high 



Drug Design, Development and Therapy 2014:8 



submit your manuscript | www.dovepi ess.con i 

Dovepress 



1 141 



Oany et al 



Doveoress 



specificity. This makes AllerHunter is a very usefiil program 
for allergen cross-reactivity prediction.^'''^^ 

Molecular docking analysis 
and HLA allele interaction 

Design of the three-dimensional (3D) 
epitope structure 

For the docking analysis, the KSSTGFVYF epitope was 
subjected to PEP-FOLD web-based server-** for 3D structure 
conversion, in order to analyze the interactions with differ- 
ent HLAs. This server modeled five 3D structures of the 
proposed epitope, and the best one was selected for the 
docking analysis. 

Docking analysis 

To ensure the binding between HLA molecules and our 
predicted epitope, a docking study was performed using 
Molegro Virtual Docker, version 6.0 (CLC bio, Aarhus, 
Denmark).^' The HLA-B* 15:01 was selected for docking on 
the basis of the available Protein Data Bank (PDB) structure 
deposited in the database, which interacted with our proposed 
epitope. The Protein Data Bank structure 1XR8, of Epstein - 
Barr virus EBNA-3 complexed with human UbcH6 peptide, 
was retrieved from the Research CoUaboratory for Structural 
Bioinformatics (RCSB) protein database'" and simplified to 
HLA-B * 1 5 :0 1 . Finally the docking was established at a grid 
ofX: 24.81, Y: 29.16, andZ: 40.59. 

Identification of the B cell epitope 

Prediction of potentially immunogenic epitopes in a given 
protein sequence may significantly reduce wet lab effort 
needed to discover the epitopes required for the design of 
vaccines and for immunodiagnostics. The aim of the predic- 
tion of the B cell epitope was to find the potential antigen 
that would interact with B lymphocytes and initiate an immu- 
noresponse.^' Tools from lEDB were used to identify the B 
cell antigenicity, including the Kolaskar and Tongaonkar 
antigenicity scale,'^ Emini surface accessibility prediction," 
Karplus and Schulz flexibility prediction,'" and Bepipred 
linear epitope prediction analysis.'^ The Chou and Fasman 
beta turn prediction tooP* was used because the antigenic 
parts of a protein belong to the beta turn regions." 

Results 

Divergence analysis 

of the retrieved sequences 

A total of 56 outer membrane protein (spike protein) 
sequences from the different variants belonging to five types 



(229E, NL63, HKUl, EMC, and OC43) of HCoVs were 
retrieved from the UniProtKB database. Then, the sequences 
were subjected to multiple sequence alignments in order to 
construct a phylogenetic tree (Figure SI). The phylogram 
showed evolutionary divergence among the different strains 
ofHCoV 

Antigenic protein prediction 

The VaxiJen server assessed all of the retrieved protein 
sequences in order to find the most potent antigenic protein. 
UniprotKB id: B2KKT9 was selected as the most potent 
antigenic protein, with a highest total prediction score of 
0.6016. Then, the protein was used for further analysis. 

T cell epitope identification 

In a preselected environment, the NetCTL server predicted 
the potent T cell epitopes from the selected protein sequence. 
Based on the high combinatorial score, the five best epitopes 
(Table 1) were selected for further analysis. 

MHC-I binding prediction, which was run through SMM, 
predicted a wide range of MHC-I allele interactions with the 
five T cell epitopes. The MHC-I alleles for which the epitopes 
showed higher affinity (IC^^ <200 nM) were selected for 
further analysis (Table 2). 

MHC-I processing (proteasomal cleavage/TAP transport/ 
MHC-I combined predictor) predicted an overall score 
for each peptide's intrinsic potential to be a T cell epitope 
from the protein sequence. Proteasome complex, which 
cleaved the peptide bonds, thus converted the proteins into 
peptides. The peptide molecules from proteasome cleavage 
associated with class-I MHC molecules, and the peptide- 
MHC molecule then were transported to the cell membrane 
where they were presented to T helper cells. Here, higher 
overall score for each peptide denotes higher processing 
capabilities (Table 2). 

Among the five T cell epitopes, a 9 mer epitope, 
KSSTGFVYF, was found to interact with most MHC-I 
alleles, including HLA-B*27:20; HLA-B* 15: 17; HLA- 
B*15:03; HLA-B*40:13; HLA-A*32:07; HLA-B*58:01; 



Table 1 The selected epitopes, on the basis of their overall score 


predicted by the 


NetCTL server 




Number 


Epitopes 


Overall score (nM) 


1 


GSDVNCNGY 


2.9177 


2 


TLQYDVLFY 


2.1285 


3 


YYCFINSTI 


1.797 


4 


KSSTGFVYF 


1 .7667 


5 


KTLQYDVLF 


1 .5846 
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Table 2 The five potential T cell epitopes, along with their interacting MHC-I alleles and total processing score, and epitope 
conservancy result 



Epitope 


Interacting MHC-I allele with an affinity of <200 nM 
and the total score (proteasome score, TAP score, 
MHC-I score, processing score) 


Epitope conservancy 


GSDVNCNGY 


HLA-C* 12:03 (1.77), HLA-A*32:07 (1.40), HLA-B*27:20 (1.39) 
HLA-A*68:23 (1.30), HLA-C*OS:OI (1.02), HLA-B*40:I3 (0.50) 
HLA-A*0l:0l (0.46), HLA-C*07:0I (0.36), HLA-A*32:I5 (0.21) 


64.29% 


KTLQYDVLF 


HLA-B*IS:I7(I.77), HLA-A*32:07 (1.39), HLA-A*32:0I (1.37) 
HLA-A*68:23 (1.19), HLA-B*S8:0I (0.84), HLA-B*40:I3 (0.69) 
HLA-B* 15:03 (0.5 1), HLA-B*57:0I (0.27), HLA-C* 12:03 (0.21) 
Hl-A-A*32: 1 5 (0. 1 7), HLA-A*26:02 (0. 1 6) 


64.29% 


KSSTGFVYF 


HLA-B*27:20 (1.96), HLA-B* 15: 17 (1.65), HLA-B* 15:03 (1.47) 
HLA-B*40:I3 (1.34), HLA-A*32:07 (1.29), HLA-B*58:0I (0.95) 
HLA-C*03:03 (0.71), HLA-A*68:23 (0.61), HLA-C* 12:03 (0.56) 
HLA-A*02:50 (0.54), HLA-A*32:0I (0.52), HLA-A*32:I5 (0.50) 
HLA-C*05:0I (0.44), HLA-C* 15:02 (0.41), HLA-B*58:02 (0.39) 
HLA-B* 15:01 (0.36) 


64.29% 


TLQYDVLFY 


HLA-A*80:0I (2.00), HLA-B*27:20 (1.84), HLA-A*32:07 (1.44) 
HLA-C* 12:03 (1.24), HLA-A*29:02 (1.07), HLA-A*68:23 (I.OI) 
HLA-A*32:I5 (0.83), HLA-C*03:03 (0.83), HLA-C* 14:02 (0.72) 
HLA-B*40:I3 (0.39), HLA-B* 15:01 (0.39) 


64.29% 


YYCFINSTI 


HLA-C* 14:02 (0.73), HLA-A*24:03 (0.33), HLA-A*32:07 (0.33) 
HLA-A*02:50 (0.21), HLA-B*27:20 (0.12), HLA-C* 12:03 (-0.08) 
HLA-C*03:03(-0.I4), HLA-A*23:OI(-0.33), HLA-A*68:23(-0.33) 
HLA-A*32:I5 (-0.34), HLA-A*24:02 (-0.70) 


64.29% 



Abbreviations: HI_A, human leukocyte antigen; MHC-I, major histocompatibility complex class I; TAP, transporter of antigen presentation. 



HLA-C*03:03; HLA-A*68:23; HLA-C* 12:03; HLA-A*02:50; 
HLA-A*32:01; HLA-A*32:15; HLA-C*05:01; HLA- 
C*15:02; HLA-B*58:02; HLA-B*15:01 with higher affinity 
(Table 2). 

Epitope conservancy and population 
coverage analysis 

The lEDB conservancy analysis tool analyzed the conser- 
vancy of the predicted epitopes, which are shown in Table 2. 
The population coverage of the predicted epitopes is depicted 
in Figure 2. 

Allergenicity assessment 

The sequence-based allergenicity prediction was precisely 
calculated using the AUerHunter tool, and the predicted que- 
ried epitope allergenicity score was 0.02 (sensitivity =93.0%, 
specificity =79.4%). 

Molecular docking analysis 

The predicted epitope bound in the groove of the HLA- 
B*15:01 with an energy of-17.662 kcal/mol. The docking 
interaction was visualized with the PyMOL molecular graph- 
ics system, version 1.5.0.4 (Schrodinger, LLC, Portland, OR, 
USA), shown in Figure 3. 



B cell epitope identification 

Here, we predicted amino acid scale-based methods for the 
identification of potential B-cell epitopes. According to this 
procedure we used different analysis methods for the predic- 
tion of a continuous B cell epitope. 

The Kolaskar and Tongaonkar'^ antigenicity predic- 
tion method analyzed antigenicity on the basis of the 
physiochemical properties of amino acids and abundances 
in experimentally known epitopes. The average antigenic 
propensity of the protein was 1.058, with maximum of 
1.240 and minimum of 0.920. The antigenic determina- 
tion threshold for the protein was 1.00; all values greater 
than 1 .00 were potential antigenic determinants. We found 
that seven epitopes satisfied the threshold value set prior 
to the analysis, and they had the potential to express the B 
cell response. The results are summarized in Table 3 and 
Figure 4. 

To be a potent B cell epitope, it must have surface 
accessibility. Hence Emini surface accessibility prediction 
was obtained. The region 88 to 94 amino acid residues 
were more accessible. This is described in Figure 5 and 
Table 4. 

The beta turns are often accessible and considerably 
hydrophilic in nature. These are two properties of antigenic 
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Figure 2 Population coverage, based on MHC-I restriction data. Different HCoV-affected regions were selected for evaluation of the population coverage of the proposed 
epitopes. 

Notes: In the graphs, the line (-0-) represents the cumulative percentage of population coverage of the epitopes; the bars represent the population coverage for each 
epitope. 

Abbreviations: HCoV, human coronavirus; HLA, human leukocyte antigen; MHC-I, major histocompatibility complex class I; PC90, 90% population coverage. 



regions of a protein."* For this reason, Chou and Fasman beta- 
turn prediction was done. The region 73-95 (approximately 
73-79 and 88-95) was considered as a P-turns region 
(Figure 6). 

From the experimental evidence, it has been found that 
the flexibility of the peptide is correlated to antigenicity.'' 
Hence, the Karplus and Schulz'* flexibility prediction method 
was implemented. In this prediction method, the region of 



75-95 was found to be the most flexible (Figure 7). Finally, 
we launched the Bepipred linear epitope prediction tool. 
This program is based on a Hidden Markov model, the best 
single method for predicting linear B-cell epitopes. The result 
of analysis with this method is summarized in Figure 8 and 
Table 5. By cross-referencing all the data, we predicted that 
the peptide sequences from 88-94 amino acids are capable of 
inducing the desired immune response as B cell epitopes. 
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Table 3 Kolaskar 


and Tongaonkar 


antigenicity analysis 






Number 


Start position 


End position 


Peptide 


reptide lengtn 


1 


4 


12 


CLCPVPGLK 


9 


2 


14 


2! 


STGFVYFN 


8 


3 


26 


32 


DVNCNGY 


7 


4 


34 


40 


HNSVADV 


7 


5 


54 


84 


N LKSG VI VFKTLQYDVLFYCSNSSSG VLDTT 


31 


6 


86 


99 


PFGPSSQPYYCFIN 


14 


7 


104 


126 


TTHVSTFVGILPPTVREIVVART 


23 



Discussion 

The development of a new vaccine in a timely fashion is 
very crucial for defending the ever rising global burden 
of disease."""^'' With the advancement of sequence-based 




Figure 3 HLA-B* 1 5:0 1 and epitope KSSTGFVYF interaction analysis. (A) The three 
dimensional structure of the epitope KSSTGFVYF. (B) The epitope KSSTGFVYF 
binds in the groove of the HLA-B*I5:0I. 

Abbreviation: HLA-B, the-major histocompatibility complex, class I, B. 



technology, now we have enough information about the 
genomics and proteomics of different viruses. As a result, 
with the help of various bioinformatics tools, we can design 
peptide vaccines based on a neutralizing epitope. For 
example, the design of an epitope-based vaccine against 
rhinovirus,'*^ dengue virus,'"' chikungunya virus,"' Saint 
Louis encephalitis virus,"** etc has already been suggested. 
Though epitope-based vaccine design has become a familiar 
concept, in the case of HCoV there has not yet been much 
work done. The HCoV is an RNA virus, which tends to 
mutate more frequently than the DNA viruses.'" These types 
of mutation mostly occur at the outer membrane protein, ie, 
at the spike protein.^" These types of mutation increase the 
sustainability of the HCoVs, by ensuring their escape from 
both the cell-mediated and humoral immune responses.^' 
Despite this, spike proteins have the most potential as a 
target for vaccine design because of their ability to induce a 
faster and longer-term mucosal immune response than that 
of the other proteins^^ and for this reason, has gained much 
popularity with researchers.^'-^" From this aspect, a universal 
HCoV vaccine needs to be designed, in order to overcome 
the adverse effects of this viral infection. 

At present, vaccines are mostly based on B cell immunity. 
But recently, vaccine based on T cell epitope has been 
encouraged as the host can generate a strong immune 
response by CD8+ T cell against the infected cell." With 
time, due to antigenic drift, any foreign particle can escape 
the antibody memory response; however, the T cell immune 
response often provides long-lasting immunity. Here, we 
predicted both B cell and T cell epitopes for conferring 
immunity in different ways, but other recent studies about 
HCoV represented the T cell epitope only, and we want to 
express our greater findings here.^' There are several criteria 
that need to be fulfilled by a vaccine candidate epitope, and 
our predicted epitope fulfilled all the criteria. The initial 
criterion is the conservancy of the epitopes, which was 
measured by the lEDB conservancy analysis tool. Among 
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I Threshold =1.0 



I Threshold =1.000 




0 20 40 60 80 100 120 140 

Sequence position 

Figure 4 Kolashkar and Tongaonkar antigenicity prediction of the most antigenic 
protein, B2KKT9. 

Notes: The x-axis and y-axis represent the sequence position and antigenic 
propensity, respectively. The threshold value is 1 .0. The regions above the threshold 
are antigenic, shown in yellov/. 




Sequence position 

Figure 5 Emini surface accessibility prediction of the most antigenic protein, 
B2KKT9. 

Notes: The x-axis and y-axis represent the sequence position and surface probability, 
respectively. The threshold value is 1.000. The regions above the threshold are 
antigenic, shown in yellow. 



the five potential T cell epitopes, all possessed the same 
conservancy, of 64.29%. We also found similar conservancy 
of the B cell epitope, which was 64.29%. Having the same 
conservancy for all the epitopes, the KSSTGFVYF epitope 
possessed the highest amount of interactions with the HLA 
alleles. A very recent study showed a highly conserved 
sequence in RNA directed RNA polymerase of HCoVs;''' 
nevertheless, our discovery of a spike protein with 64.29% 
conservancy among the 56 spike proteins has drawn much 
attention, and we consider this too as a epitope candidate 
for vaccine development. 

Population coverage is another important factor in the 
development of a vaccine. For the all predicted epitopes, the 
cumulative percentage of population coverage was measured. 
We found the highest population coverage in South Ireland, 
which was 90.73%), followed by Italy and North America, 
with 87.13% and 75.68%) coverage, respectively. The HCoV 
was first found in Europe,"'^* hence, we also observed the 
overall coverage in Europe and found this to be 82.59%. 
Oceania's region covered 79.08%. We also recorded 80.3 1% 
population coverage for the East Asian region, considered 
as one of the hot spots of this viral infection. It has been 
reported that in Hong Kong and the People's Republic of 
China, during 2001-2002 there were about 587 cases (among 
these, 26 children) of acute respiratory disease caused by 
different types of HCoV infection. Specifically, in Hong 



Table 4 Emini surface accessibility prediction of the peptides 



Number 


Start 


End 


Peptide 


Peptide 




position 


position 




length 


1 


30 


36 


NGYQHNS 


7 


2 


50 


55 


NSVDNL 


6 


3 


88 


94 


GPSSQPY 


7 



Kong, each year this virus caused about 224 hospitalizations 
per 100,000 population aged ^6 years.'' 

However, allergenicity is one of the prominent 
obstacles in vaccine development. Today, most vaccines 
stimulate the immune system into an "allergic" reaction,*"" 
through induction of type 2 T helper T (Th2) cells and 
immunoglobulin E (IgE). The AUerHunter score value is 
the probability that a particular sequence is a cross-reactive 
allergen. However, the threshold for prediction of allergen 
cross-reactivity is adjusted such that a sequence is predicted 
as a cross-reactive allergen if its probability is >0.06. Here, 
our proposed epitope's allergenicity score was 0.02, and 
thus it was considered as a nonallergen. According to the 
FAO/WHO evaluation scheme of allergenicity prediction, 
a sequence is potentially allergenic if it either has an iden- 
tity of at least six contiguous amino acids or >35 percent 
sequence identity over a window of 80 amino acids when 
compared to known allergens. Hence, our query epitopes did 
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Figure 6 Karplus and Schuiz flexibility prediction of the most antigenic protein. 



B2KKT9. 

Notes: The x-axis and y-axis represent the position and score, respectively. The 
threshold is 1 .0. The flexible regions of the protein are shown in yellow color, above 
the threshold value. 
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Figure 7 Ciiou and Fasman beta-turn prediction of tiie most antigenic protein, 
B2KKT9. 

Notes: Tiie x-axis and y-axis represent tiie position and score, respectiveiy. Tiie 
thresiiold is i .04 i . Tiie regions iiaving beta turns in the protein are shown in yeliow 
color, above the threshold vaiue. 
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Figure 8 Bepipred linear epitope prediction of the most antigenic protein, 
B2KKT9. 

Notes; The x-axis and y-axis represent the position and score, respectively. The 
threshold is 0.35. The regions having beta turns are shov^n in yellow. The highest 
peak region indicates the most potent B cell epitope. 



not fulfill the criteria for the FAO/WHO evaluation scheme 
of allergenicity prediction and was classified by this scheme 
as a nonallergen. 

However, our predicted in silico results were based on dili- 
gent analysis of sequence and various immune databases. This 
type of study has recently received experimental validation,*"' 
and for this reason, we have suggested that the proposed 
epitope would be able to trigger an efficacious immune 
response as a peptide vaccine in vivo. 

Conclusion 

Our study has shown that integrated computational 
approaches could be used for predicting vaccine candidates 
against pathogens such as HCoV, with previously described, 
validated procedures. 

In this way, in silico studies save both time and costs for 
researchers and can guide the experimental work, with higher 
probabilities of finding the desired solutions and with fewer 
trial and error repeats of assays. 



Table 5 Bepipred linear epitope prediction 


Number 


Start 


End 


Peptide 


Peptide 




position 


position 




length 


1 


10 


13 


GLKS 


4 


2 


23 


35 


TGSDVNCNGYQHN 


13 


3 


SO 


50 


N 


1 


4 


52 


54 


VDN 


3 


5 


76 


81 


SSSGVL 


6 


6 


85 


93 


IPFGPSSQP 


9 
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sp|Q6Q1S2|SPIKE_CVHNL 0.00694 
tr|H9EJ72|H9EJ72_CVHNL -C.00546 
tr|H9EJA0|H9EJA0_CVHNL -0.00247 
tr|H9EJ37|H9EJ37_CVHNL 0.00012 
tr|l7CIVID1|l7CMD1_CVHNL -0.00138 
tr|H9EJV2|H9EJV2_CVHNL 0.00026 
tr|H9EJW4|H9EJW4_CVHNL 0.00048 
tr|H9EJ44|H9EJ44_CVHNL 0.0006 
tr|B2KKR6|B2KKR6_CVHNL 0.0051 
tr|Q06Y10|Q06Y10_CVHNL -0.00147 
tr|H9EJ16|H9EJ16_CVHNL -0.00286 
tr|Q06Y16|Q06Y16_CVHNL -0.00992 
sp|P15423|SPIKE_CVH22 0.00192 
tr|H1AG29|H1AG29_CVH22 0.00234 
tr|C1J1E8|C1J1E8_CVH22 0 
tr|C1J1E9|C1J1E9_CVH22 0 
tr|C1J1F3|C1J1F3_CVH22 0 
tr|C1J1E7|C1J1E7_CVH22 0 
tr|C1J1F1|C1J1F1_CVH22 0 
tr|C1J1F0|C1J1F0_CVH22 0 
tr|C1J1F2|C1J1F2_CVH22 0 
tr|H1AG32|H1AG32_CVH22 -0.00703 
tr|H1AG30|H1AG30_CVH22 -0.00256 
tr|H1AG31|H1AG31_CVH22 0.00649 
tr|J9VBQ8|J9VBQ8_CVH22 0.00127 
tr|J9UWK8|J9UWK8 _CVH22 0.00384 
tr|H1AG33|H1AG33_CVH22 0.0092 
sp|Q14EB0|SPIKE_CVH2 0.00331 
sp|Q0ZI\/IE7|SPIKE_CVHN5 0.00335 
sp|Q5MQD0|SPIKE_CVHN10. 07716 
sp|P36334|SPIKE_CVHOC 0.16287 
sp|K9N5Q8|SPIKE_CVEMC 0.34185 
tr|H9EJ51|H9EJ51_CVHNL -0.0001 5 
tr|B2KKT7|B2KKT7_CVHNL 0.00744 
tr|B2KKT9|B2KKT9_CVHNL 0.00745 
tr|B2KKS1 |B2KKS1_CVHNL 0 
tr|B2KKR7|B2KKR7_CVHNL 0 
tr|B2KKS7|B2KKS7_CVHNL 0 
tr|H9EJ86|H9EJ86_CVHNL 0 
tr|B2KKS2|B2KKS2_CVHNL 0 
tr|B2KKS6|B2KKS6_CVHNL 0 
tr|B2KKS5|B2KKS5_CVHNL 0 
tr|B2KKS9|B2KKS9_CVHNL 0 
tr|B2KKT1|B2KKT1_CVHNL 0 
tr|B2KKS8|B2KKS8„CVHNL 0 
tr|B2KKT3|B2KKT3_CVHNL -0.00023 
tr|H9EJ93|H9EJ93_CVHNL 0.0023 
tr|B2KKT8|B2KKT8_CVHNL -0.00012 
tr|B2KKT0|B2KKT0_CVHNL 0 
tr|B2KKT2|B2KKT2_CVHNL 0 
tr|B2KKS3|B2KKS3_CVHNL 0 
tr|B2KKR8|B2KKR8_CVHNL 0 
tr|B2KKS4|B2KKS4_CVHNL 0 
tr|B2KKT6|B2KKT6„CVHNL 0 
tr|B2KKT5|B2KKT5_CVHNL 0 
tr|B2KKT4|B2KKT4„CVHNL 0 



Figure S I Phylogenetic tree, showing the evolutionary divergence among the different membrane proteins of human coronaviruses. 
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